amino-acid sequence
Protein as a Second Language for LLMs
Chen, Xinhui, Li, Zuchao, Gao, Mengqi, Zhang, Yufeng, Leong, Chak Tou, Li, Haoyang, Chen, Jiaqi
Deciphering the function of unseen protein sequences is a fundamental challenge with broad scientific impact, yet most existing methods depend on task-specific adapters or large-scale supervised fine-tuning. We introduce the "Protein-as-Second-Language" framework, which reformulates amino-acid sequences as sentences in a novel symbolic language that large language models can interpret through contextual exemplars. Our approach adaptively constructs sequence-question-answer triples that reveal functional cues in a zero-shot setting, without any further training. To support this process, we curate a bilingual corpus of 79,926 protein-QA instances spanning attribute prediction, descriptive understanding, and extended reasoning. Empirically, our method delivers consistent gains across diverse open-source LLMs and GPT-4, achieving up to 17.2% ROUGE-L improvement (average +7%) and even surpassing fine-tuned protein-specific language models. These results highlight that generic LLMs, when guided with protein-as-language cues, can outperform domain-specialized models, offering a scalable pathway for protein understanding in foundation models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (4 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education > Health & Safety > School Nutrition (0.52)
AlphaFold Spreads through Protein Science
Two years ago, as the COVID-19 pandemic swept across the world, researchers at DeepMind, the artificial intelligence (AI) and research laboratory subsidiary of Alphabet Inc., demonstrated how it could use machine learning to achieve a breakthrough in the ability to predict how proteins, the work-horses of the living cell, fold into the intricate shapes they take on. The work gave hope to biologists that they could use this kind of tool to tackle diseases such as the SARS-CoV-2 coronavirus much more quickly in the future. Researchers were able to assess the abilities of DeepMind's AlphaFold2 thanks to its inclusion in the 14th Critical Assessment of Structure Prediction (CASP14), a benchmarking competition that ran through 2020 and which added a parallel program to uncover the structures of key proteins from the SARS-CoV2 virus to try to accelerate vaccine and drug development. The organizers of CASP14 declared the tool represented "an almost complete solution to the problem of computing three-dimensional structure from amino-acid sequences," though some caveats lie behind that statement. In principle, quantum mechanical simulations can predict which collection of folds leads to the lowest combined energy of all the chemical bonds in the shape and the water and other molecules around it.
- North America > United States > New Mexico (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Europe > United Kingdom > England > Surrey (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Physics - Machine-Learning Model Reveals Protein-Folding Physics
Proteins control every cell-level aspect of life, from immunity to brain activity. They are encoded by long sequences of compounds called amino acids that fold into large, complex 3D structures. Computational algorithms can model the physical amino-acid interactions that drive this folding [1]. But determining the resulting protein structures has remained challenging. In a recent breakthrough, a machine-learning model called AlphaFold [2] predicted the 3D structure of proteins from their amino-acid sequences.
- North America > United States > Massachusetts (0.05)
- North America > United States > Maine (0.05)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.05)
Macromolecule Classification Based on the Amino-acid Sequence
Ghaffar, Faisal, Khan, Sarwar, O., Gaddisa, Yu-jhen, Chen
Deep learning is playing a vital role in every field which involves data. It has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using traditional machine learning techniques in the past. In this study we focused on classification of protein sequences with deep learning techniques. The study of amino acid sequence is vital in life sciences. We used different word embedding techniques from Natural Language processing to represent the amino acid sequence as vectors. Our main goal was to classify sequences to four group of classes, that are DNA, RNA, Protein and hybrid. After several tests we have achieved almost 99% of train and test accuracy. We have experimented on CNN, LSTM, Bidirectional LSTM, and GRU.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Education > Health & Safety > School Nutrition (0.80)
Is artificial intelligence deserving of all the hype?
Artificial intelligence is moving into all areas of engineering, science, business and industry; indeed, AI is now the dominant approach, pushing others to the background. Recently, DeepMind, owned by Google, demonstrated an algorithm called AlphaFold to predict the three-dimensional structure of a protein from its amino-acid sequence. This is a fundamental problems in biology. Laboratory methods are laborious and therefore progress has been slow. AlphaFold would make the process very fast and thereby greatly accelerate important applications such as discovering new drugs.
- North America > United States (0.15)
- Asia > China (0.05)
DeepMind Makes History Again By Solving a 50-Year-Old Problem In Biology
You may have heard about "DeepMind" in the past, and if you haven't, now you will. To this day, DeepMind has acquired a number of achievements since it was founded, but it is most notable for AlphaGo, an AI program that beat some of the best professional Go players in history including Ke Jie. DeepMind's AlphaFold 2 can now identify a protein's three-dimensional structures from its amino-acid sequence to the width of an atom. To give some context, AlphaFold2 competed with over 100 research groups worldwide in a competition known as the Critical Assessment of Protein Structure Prediction, or CASP. The goal was exactly what AlphaFold 2 achieved, to be able to predict a protein's structure from its amino-acid sequence.
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Leisure & Entertainment > Games > Go (0.60)